AITopics | working memory

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

Neural Information Processing SystemsDec-27-2025, 03:03:01 GMT

Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency effects, and neural clusters and correlates specialized for different domains and functionalities of WM. In the experiments, we also reveal some limitations in existing models to approximate human behavior. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities.

benchmarking human and ais, name change, working memory, (9 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

A Definition of AGI

Hendrycks, Dan, Song, Dawn, Szegedy, Christian, Lee, Honglak, Gal, Yarin, Brynjolfsson, Erik, Li, Sharon, Zou, Andy, Levine, Lionel, Han, Bo, Fu, Jie, Liu, Ziwei, Shin, Jinwoo, Lee, Kimin, Mazeika, Mantas, Phan, Long, Ingebretsen, George, Khoja, Adam, Xie, Cihang, Salaudeen, Olawale, Hein, Matthias, Zhao, Kevin, Pan, Alexander, Duvenaud, David, Li, Bo, Omohundro, Steve, Alfour, Gabriel, Tegmark, Max, McGrew, Kevin, Marcus, Gary, Tallinn, Jaan, Schmidt, Eric, Bengio, Yoshua

arXiv.org Artificial IntelligenceDec-4-2025

The lack of a concrete definition for Artificial General Intelligence (AGI) obscures the gap between today's specialized AI and human-level cognition. This paper introduces a quantifiable framework to address this, defining AGI as matching the cognitive versatility and proficiency of a well-educated adult. To operationalize this, we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition. The framework dissects general intelligence into ten core cognitive domains-including reasoning, memory, and perception-and adapts established human psychometric batteries to evaluate AI systems. Application of this framework reveals a highly "jagged" cognitive profile in contemporary models. While proficient in knowledge-intensive domains, current AI systems have critical deficits in foundational cognitive machinery, particularly long-term memory storage. The resulting AGI scores (e.g., GPT-4 at 27%, GPT-5 at 57%) concretely quantify both rapid progress and the substantial gap remaining before AGI.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.18212

Country:

North America > United States > California (0.28)
Europe > United Kingdom > England (0.27)

Genre: Research Report (0.51)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Predicting Cognition from fMRI:A Comparative Study of Graph, Transformer, and Kernel Models Across Task and Rest Conditions

Patel, Jagruti, Schöttner, Mikkel, Bolton, Thomas A. W., Hagmann, Patric

arXiv.org Artificial IntelligenceJul-29-2025

Department of Radiology, Lausanne University Hospital and University of Lausanne (CHUV -UNIL), Lausanne, Switzerland ABSTRACT Predicting cognition from neuroimaging data in healthy individuals offers insights into the neural mechanisms underlying cognitive abilities, with potential applications in precision medicine and early detection of neurological and psychiatric conditions. This study systematically benchmarked classical machine learning (Kernel Ridge Regression) and advanced deep learning models (Graph Neural Networks and Transformer-GNNs) for cognitive prediction using Resting-state, Working Memory, and Language task fMRI data from the Human Connectome Project Y oung Adult (HCP-Y A) dataset. Among the methods compared, a GNN combining structural and functional connectivity consistently achieved the highest performance across all fMRI modalities; however, its advantage over Kernel Ridge Regression using functional connectivity alone was not statistically significant. These findings emphasize the importance of selecting appropriate model architectures and feature representations to fully leverage the spatial and temporal richness of neuroimaging data. This study highlights the potential of multimodal graph-aware deep learning models to combine structural and functional connectivity for cognitive prediction, as well as the promise of Transformer-based approaches for capturing temporal dynamics. By providing a comprehensive comparison of models, this work serves as a guide for advancing brain-behavior modeling using fMRI, structural connectivity and deep learning. INTRODUCTION Understanding and predicting behavior from neuroimaging data in healthy individuals is crucial for advancing our knowledge of the brain's functional architecture and its relationship to behavior. While significant efforts have focused on patients with neurological or psychiatric disorders (Arbabshirani, Plis, Sui, & Calhoun, 2017; Sabuncu, Konukoglu, & Initiative, 2015), the study of healthy participants remains underexplored. Analyzing brain connectivity in healthy individuals can provide valuable insights into the baseline neural mechanisms underlying behavior, offering a foundation for early prognosis of potential neuro or psychiatric conditions (Bassett & Sporns, 2017; Fornito, Zalesky, & Breakspear, 2015; Lui, Zhou, Sweeney, & Gong, 2016; Zhou, Gennatas, Kramer, Miller, & Seeley, 2012). By examining the intricate patterns of functional and structural connectivity, we can identify biomarkers indicative of brain health, which can serve as early indicators of disease susceptibility (M.

artificial intelligence, machine learning, transformer, (15 more...)

arXiv.org Artificial Intelligence

2507.21016

Country: Europe > Switzerland > Vaud > Lausanne (0.64)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.67)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion

Ioan, Calin Teodor

arXiv.org Artificial IntelligenceJun-27-2025

Monocular depth estimation methods traditionally train deep models to infer depth directly from RGB pixels. This implicit learning often overlooks explicit monocular cues that the human visual system relies on, such as occlusion boundaries, shading, and perspective. Rather than expecting a network to discover these cues unaided, we present ThirdEye, a cue-aware pipeline that deliberately supplies each cue through specialised, pre-trained, and frozen networks. These cues are fused in a three-stage cortical hierarchy (V1->V2->V3) equipped with a key-value working-memory module that weights them by reliability. An adaptive-bins transformer head then produces a high-resolution disparity map. Because the cue experts are frozen, ThirdEye inherits large amounts of external supervision while requiring only modest fine-tuning. This extended version provides additional architectural detail, neuroscientific motivation, and an expanded experimental protocol; quantitative results will appear in a future revision.

artificial intelligence, machine learning, specialist, (15 more...)

arXiv.org Artificial Intelligence

2506.20877

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

Neural Information Processing SystemsJan-20-2025, 01:30:16 GMT

Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks.

benchmarking human and ais, decoding, working memory, (7 more...)

Neural Information Processing Systems

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)
Information Technology > Artificial Intelligence > Cognitive Science (0.60)

Add feedback

Improving Factuality with Explicit Working Memory

Chen, Mingda, Li, Yang, Padthe, Karthik, Shao, Rulin, Sun, Alicia, Zettlemoyer, Luke, Gosh, Gargi, Yih, Wen-tau

arXiv.org Artificial IntelligenceDec-23-2024

In the realm of long-form text generation, a notable vulnerability of large language models (LLMs) is their propensity for hallucination, wherein the generated text contains factually inaccurate information. By prepending the input prompt with relevant documents from trustworthy sources, retrieved-augmented generation (RAG) (Lewis et al., 2020; Shi et al., 2024) has been shown to be a simple yet effective approach that substantially mitigates the hallucination issue. To further enhance the factual accuracy of model output, various iterative prompting methods have been proposed that build upon RAG. For instance, FLARE (Jiang et al., 2023) generates responses sentence by sentence, and if a newly generated sentence contains low-probability tokens, it retrieves a new set of documents and re-runs RAG to regenerate the sentence. Alternatively, Self-RAG (Asai et al., 2024) employs a self-critic component to verify the correctness of each partial generation and repeatedly queries a retrieval system to update the background knowledge, thereby producing more accurate and faithful responses. While these systems demonstrate significant empirical improvement, they are restricted in the traditional RAG design. Context-relevant knowledge through retrieval is the only online feedback to the model, incorporated as part of the input string.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.18069

Country:

Asia (0.47)
North America > Mexico (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.44)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Mazed and Confused: A Dataset of Cybersickness, Working Memory, Mental Load, Physical Load, and Attention During a Real Walking Task in VR

Setu, Jyotirmay Nag, Le, Joshua M, Kundu, Ripan Kumar, Giesbrecht, Barry, Höllerer, Tobias, Hoque, Khaza Anuarul, Desai, Kevin, Quarles, John

arXiv.org Artificial IntelligenceSep-10-2024

Virtual Reality (VR) is quickly establishing itself in various industries, including training, education, medicine, and entertainment, in which users are frequently required to carry out multiple complex cognitive and physical activities. However, the relationship between cognitive activities, physical activities, and familiar feelings of cybersickness is not well understood and thus can be unpredictable for developers. Researchers have previously provided labeled datasets for predicting cybersickness while users are stationary, but there have been few labeled datasets on cybersickness while users are physically walking. Thus, from 39 participants, we collected head orientation, head position, eye tracking, images, physiological readings from external sensors, and the self-reported cybersickness severity, physical load, and mental load in VR. Throughout the data collection, participants navigated mazes via real walking and performed tasks challenging their attention and working memory. To demonstrate the dataset's utility, we conducted a case study of training classifiers in which we achieved 95% accuracy for cybersickness severity classification. The noteworthy performance of the straightforward classifiers makes this dataset ideal for future researchers to develop cybersickness detection and reduction models. To better understand the features that helped with classification, we performed SHAP(SHapley Additive exPlanations) analysis, highlighting the importance of eye tracking and physiological measures for cybersickness prediction while walking. This open dataset can allow future researchers to study the connection between cybersickness and cognitive loads and develop prediction models. This dataset will empower future VR developers to design efficient and effective Virtual Environments by improving cognitive load management and minimizing cybersickness.

cybersickness, dataset, participant, (14 more...)

arXiv.org Artificial Intelligence

2409.06898

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Texas (0.04)
North America > United States > Missouri > Boone County > Columbia (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Consumer Health (1.00)
Leisure & Entertainment > Games > Computer Games (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

Sikarwar, Ankur, Zhang, Mengmi

arXiv.org Artificial IntelligenceNov-1-2023

Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency effects, and neural clusters and correlates specialized for different domains and functionalities of WM. In the experiments, we also reveal some limitations in existing models to approximate human behavior. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities. Our source code and data are available at https://github.com/ZhangLab-DeepNeuroCogLab/WorM.

benchmarking human and ais, decoding, working memory, (2 more...)

arXiv.org Artificial Intelligence

2307.10768

Genre: Research Report > New Finding (0.53)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Dopamine Modulation in a Basal Ganglio-Cortical Network of Working Memory

Neural Information Processing SystemsApr-6-2023, 16:13:15 GMT

Dopamine exerts two classes of effect on the sustained neural activity in prefrontal cortex that underlies working memory. Direct release in the cortex increases the contrast of prefrontal neurons, enhancing the ro- bustness of storage. Release of dopamine in the striatum is associated with salient stimuli and makes medium spiny neurons bistable; this mod- ulation of the output of spiny neurons affects prefrontal cortex so as to indirectly gate access to working memory and additionally damp sensi- tivity to noise. Existing models have treated dopamine in one or other structure, or have addressed basal ganglia gating of working memory ex- clusive of dopamine effects. In this paper we combine these mechanisms and explore their joint effect.

basal ganglio-cortical network, dopamine modulation, working memory

Neural Information Processing Systems

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Peng, Baolin, Galley, Michel, He, Pengcheng, Cheng, Hao, Xie, Yujia, Hu, Yu, Huang, Qiuyuan, Liden, Lars, Yu, Zhou, Chen, Weizhu, Gao, Jianfeng

arXiv.org Artificial IntelligenceMar-8-2023

Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e.g., task-oriented dialog and question answering. However, applying LLMs to real-world, mission-critical applications remains challenging mainly due to their tendency to generate hallucinations and their inability to use external knowledge. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules. Our system makes the LLM generate responses grounded in external knowledge, e.g., stored in task-specific databases. It also iteratively revises LLM prompts to improve model responses using feedback generated by utility functions, e.g., the factuality score of a LLM-generated response. The effectiveness of LLM-Augmenter is empirically validated on two types of scenarios, task-oriented dialog and open-domain question answering. LLM-Augmenter significantly reduces ChatGPT's hallucinations without sacrificing the fluency and informativeness of its responses. We make the source code and models publicly available.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.12813

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
South America > Brazil > São Paulo (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Consumer Products & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

working memory

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

A Definition of AGI

Predicting Cognition from fMRI:A Comparative Study of Graph, Transformer, and Kernel Models Across Task and Rest Conditions

THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

Improving Factuality with Explicit Working Memory

Mazed and Confused: A Dataset of Cybersickness, Working Memory, Mental Load, Physical Load, and Attention During a Real Walking Task in VR

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

Dopamine Modulation in a Basal Ganglio-Cortical Network of Working Memory

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback